Search CORE

20 research outputs found

Modularis: Modular Relational Analytics over Heterogeneous Distributed Platforms

Author: Alonso Gustavo
Klimovic Ana
Koutsoukos Dimitrios
Marroquín Renato
Müller Ingo
Publication venue
Publication date: 01/09/2021
Field of study

The enormous quantity of data produced every day together with advances in data analytics has led to a proliferation of data management and analysis systems. Typically, these systems are built around highly specialized monolithic operators optimized for the underlying hardware. While effective in the short term, such an approach makes the operators cumbersome to port and adapt, which is increasingly required due to the speed at which algorithms and hardware evolve. To address this limitation, we present Modularis, an execution layer for data analytics based on sub-operators, i.e.,composable building blocks resembling traditional database operators but at a finer granularity. To demonstrate the advantages of our approach, we use Modularis to build a distributed query processing system supporting relational queries running on an RDMA cluster, a serverless cloud platform, and a smart storage engine. Modularis requires minimal code changes to execute queries across these three diverse hardware platforms, showing that the sub-operator approach reduces the amount and complexity of the code. In fact, changes in the platform affect only sub-operators that depend on the underlying hardware. We show the end-to-end performance of Modularis by comparing it with a framework for SQL processing (Presto), a commercial cluster database (SingleStore), as well as Query-as-a-Service systems (Athena, BigQuery). Modularis outperforms all these systems, proving that the design and architectural advantages of a modular design can be achieved without degrading performance. We also compare Modularis with a hand-optimized implementation of a join for RDMA clusters. We show that Modularis has the advantage of being easily extensible to a wider range of join variants and group by queries, all of which are not supported in the hand-tuned join.Comment: Accepted at PVLDB vol. 1

arXiv.org e-Print Archive

Repository for Publications and Research Data

Enabling In-Vitro Serverless Systems Research

Author: Cvetković Lazar
Djokic Mihajlo
Grot Boris
Hè Hongyu
Klimovic Ana
Park Dohyun
Ustiugov Dmitrii
Publication venue
Publication date: 23/10/2023
Field of study

Edinburgh Research Explorer

SHiFT: An Efficient, Flexible Search Engine for Transfer Learning

Author: Klimovic Ana
Kolar Luka
Renggli Cedric
Rimanic Luka
Yao Xiaozhe
Zhang Ce
Publication venue
Publication date: 04/04/2022
Field of study

Transfer learning can be seen as a data- and compute-efficient alternative to training models from scratch. The emergence of rich model repositories, such as TensorFlow Hub, enables practitioners and researchers to unleash the potential of these models across a wide range of downstream tasks. As these repositories keep growing exponentially, efficiently selecting a good model for the task at hand becomes paramount. By carefully comparing various selection and search strategies, we realize that no single method outperforms the others, and hybrid or mixed strategies can be beneficial. Therefore, we propose SHiFT, the first downstream task-aware, flexible, and efficient model search engine for transfer learning. These properties are enabled by a custom query language SHiFT-QL together with a cost-based decision maker, which we empirically validate. Motivated by the iterative nature of machine learning development, we further support efficient incremental executions of our queries, which requires a careful implementation when jointly used with our optimizations

arXiv.org e-Print Archive

IX Open-source version 1.1 - Deployment and Evaluation Guide

Author: Belay Adam
Bugnion Edouard
Grossman Samuel
Gütermann Bernard
Klimovic Ana
Kogias Marios
Kozyrakis Christos
Prekas George
Primorac Mia
Publication venue
Publication date: 27/05/2016
Field of study

This Technical Report provides the deployment and evaluation guide of the IX dataplane operating system, as of its first open-source release on May 27, 2016. To facilitate the reproduction of our results, we include in this report the precise steps needed to install, deploy and configure IX and its workloads. We reproduce all benchmarks previously published in two peer-reviewed publications at OSDI '14 and SoCC '15 using this up-to-date, open-source code base

Infoscience - École polytechnique fédérale de Lausanne

A survey and classification of software-defined storage systems

Author: Alysson Bessani
Angel Sebastian
Anwar Ali
Anwar Ali
Belaramani Nalini M.
Belay Adam
Carl
Cully Brendan
Frank
Ghodsi Ali
Gracia-Tinedo Raúl
Gulati Ajay
Gulati Ajay
Hat Red
Hsu Chin-Jung
Hunt Patrick
José Pereira
João Paulo
Kim Hyeong-Jun
Klimovic Ana
Koponen Teemu
Li Ning
Lumb Christopher R.
Mace Jonathan
Mesnier Michael
Murugan Muthukumar
Ongaro Diego
Peter Simon
Qian Yingjin
Raghavan Ajaykrishna
Ricardo Macedo
Riedel Erik
Schroeder Bianca
Schwan Philip
Seshadri Sudharsan
Sevilla Michael A.
Shan Yizhou
Shue David
Shue David
Soheil
Song Huaiming
Stefanovici Ioan
Weil Sage A.
Wires Jake
Yang Bin
Yang Suli
Zhang Xuechen
Zhu Timothy
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

The exponential growth of digital information is imposing increasing scale and efficiency demands on modern storage infrastructures. As infrastructure complexity increases, so does the difficulty in ensuring quality of service, maintainability, and resource fairness, raising unprecedented performance, scalability, and programmability challenges. Software-Defined Storage (SDS) addresses these challenges by cleanly disentangling control and data flows, easing management, and improving control functionality of conventional storage systems. Despite its momentum in the research community, many aspects of the paradigm are still unclear, undefined, and unexplored, leading to misunderstandings that hamper the research and development of novel SDS technologies. In this article, we present an in-depth study of SDS systems, providing a thorough description and categorization of each plane of functionality. Further, we propose a taxonomy and classification of existing SDS solutions according to different criteria. Finally, we provide key insights about the paradigm and discuss potential future research directions for the field.This work was financed by the Portuguese funding agency FCT-Fundacao para a Ciencia e a Tecnologia through national funds, the PhD grant SFRH/BD/146059/2019, the project ThreatAdapt (FCT-FNR/0002/2018), the LASIGE Research Unit (UIDB/00408/2020), and cofunded by the FEDER, where applicable

Universidade do Minho: RepositoriUM

Crossref

ReFlex

Author: Ana Klimovic
Christos Kozyrakis
Heiner Litz
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Analyzing Vectorized Hash Tables Across CPU Architectures

Author: Benson Lawrence
Böther Maximilian
Klimovic Ana
Rabl Tilmann
Publication venue: Association for Computing Machinery
Publication date: 01/07/2023
Field of study

Data processing systems often leverage vector instructions to achieve higher performance. When applying vector instructions, an often overlooked data structure is the hash table, even though it is fundamental in data processing systems for operations such as indexing, aggregating, and joining. In this paper, we characterize and evaluate three fundamental vectorized hashing schemes, vectorized linear probing (VLP), vectorized fingerprinting (VFP), and bucket-based comparison (BBC). We implement these hashing schemes on the x86, ARM, and Power CPU architectures, as modern database systems must provide efficient implementations for multiple platforms due to the continuously increasing hardware heterogeneity. We present various implementation variants and platform-specific optimizations, which we evaluate for integer keys, string keys, large payloads, skewed distributions, and multiple threads. Our extensive evaluation and comparison to three scalar hashing schemes on four servers shows that BBC outperforms scalar linear probing by a factor of more than 2x, while also scaling well to high load factors. We find that vectorized hashing schemes come with caveats that need to be considered, such as the increased engineering overhead, differences between CPUs, and differences between vector ISAs, such as AVX and AVX-512, which impact performance. We conclude with key findings for vectorized hashing scheme implementations.ISSN:2150-809

Repository for Publications and Research Data

Open access to the Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation is sponsored by USENIX. IX: A Protected Dataplane Operating System for High Throughput and Low Latency IX: A Protected Dataplane Operating System fo

Author: Adam Belay
Adam Belay
Ana Klimovic
Ana Klimovic
Christos Kozyrakis
Christos Kozyrakis
Edouard Bugnion
George Prekas
Samuel Grossman
Samuel Grossman
Publication venue
Publication date: 03/04/2020
Field of study

Abstract The conventional wisdom is that aggressive networking requirements, such as high packet rates for small messages and microsecond-scale tail latency, are best addressed outside the kernel, in a user-level networking stack. We present IX, a dataplane operating system that provides high I/O performance, while maintaining the key advantage of strong protection offered by existing kernels. IX uses hardware virtualization to separate management and scheduling functions of the kernel (control plane) from network processing (dataplane). The dataplane architecture builds upon a native, zero-copy API and optimizes for both bandwidth and latency by dedicating hardware threads and networking queues to dataplane instances, processing bounded batches of packets to completion, and by eliminating coherence traffic and multi-core synchronization. We demonstrate that IX outperforms Linux and state-of-the-art, user-space network stacks significantly in both throughput and end-to-end latency. Moreover, IX improves the throughput of a widely deployed, key-value store by up to 3.6× and reduces tail latency by more than 2×

CiteSeerX

NVM: Is it Not Very Meaningful for Databases?

Author: Alonso Gustavo
Bhartia Raghav
Friedman Michal
Klimovic Ana
Koutsoukos Dimitrios
Publication venue: Association for Computing Machinery
Publication date: 01/06/2023
Field of study

Persistent or Non Volatile Memory (PMEM) offers expanded memory capacity and faster access to persistent storage. However, there is no comprehensive empirical analysis of existing database engines under different PMEM modes, to understand how databases can benefit from the various hardware configurations. To this end, we analyze multiple different engines under common benchmarks with PMEM in AppDirect mode and Memory mode. Our results show that PMEM in Memory mode does not offer any clear performance advantage despite the larger volatile memory capacity. Also, using PMEM as persistent storage usually speeds up query execution, but with some caveats as the I/O path is not fully optimized and therefore does not always justify the additional cost. We show this to be the case through a comprehensive evaluation of different engines and database configurations under different workloads.ISSN:2150-809

Repository for Publications and Research Data